5 research outputs found

    Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.

    Get PDF
    Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/

    Evaluation of PANAMA and alternative methods on the simulated eQTL dataset.

    No full text
    <p>(<b>a,b</b>) number of recovered <i>cis</i> and <i>trans</i> associations as a function of the chosen false discovery rate cutoff. To circumvent biases due to linkage, at most one association per chromosome and gene is counted. (<b>c</b>) Receiver Operating Characteristics (ROC) for recovering true simulated associations, depicting the true positive rate (TPR) as a function of the permitted false positive rate (FPR). (<b>d</b>) inflation factors, defined as , indicating either inflated p-value distributions () or deflation () of the respective tests statistics. (<b>e</b>) Area under the ROC curve for alternative simulated datasets, subsampling certain fractions of number of simulated <i>trans</i> association. (<b>f</b>) Area under the ROC curve for alternative simulated datasets, subsampling the number of simulated confounding factors.</p

    Illustration of the PANAMA model.

    No full text
    <p>(<b>a</b>) Representation of the linear model used by PANAMA to correct for the effect of confounding factors. (<b>b</b>) Alternative settings of confounders in relation to true genetic signals: First, orthogonality between confounders and genetics. The variation in the gene expression levels (green arrow) can be better explained by the SNP (blue arrow). Second, statistical overlap between variation explained by confounders and the genetic variation as often found in <i>trans</i> hotspots. Gene expression variation can be equally well explained as genetic or due to a confounding factor. Previous methods focus in the first setting, while PANAMA is able to handle both situations. (<b>c</b>) PANAMA applied to the yeast eQTL dataset. Pronounced <i>trans</i> regulators that overlap with the learnt confounding factors are highlighted in red.</p

    Detecting regulatory gene–environment interactions with unmeasured environmental factors

    No full text
    Motivation: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits. Results: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability. Availbility and implementation: Software available at http://pmbio.github.io/envGPLVM/. Supplementary Information: Supplementary data are available at Bioinformatics online
    corecore